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1. INTRODUCTION 

Diabetes mellitus (DM) is one of the most diffuse diseases in the world. At present, about 
425 million people have been infected worldwide, and it is expected that up to 700 million people will be 
infected by 2045 [1]. It is a chronic metabolic disease caused by the pancreas not producing enough insulin 
or the body's cells not responding to the insulin that is produced. Thus, high blood sugar occurs, which leads 
to many health disorders. Depending on World Health Organization (WHO) and American Diabetes 
Association (ADA), DM is classified into four types [2]—[4]: i) type-I-DM: or insulin-dependent diabetes 
mellitus (IDDM). The failure of the body to produce insulin due to the destruction of the pancreas generates 
this type. It is usually diagnosed in children and young age, approximately 5%-10% of all diabetes mellitus 
are of this type; ii) type-II-DM: also called non-insulin-dependent diabetes mellitus (NIDDM) or "adult-onset 
diabetes." It is the most common, about 90% of diabetics. It results from the failure of the body's cells from 
consuming the secreted insulin and thus leads to an increase in blood sugar levels; iii) gestational diabetes 
(GDM): 4% of pregnant women develop this type due to pregnancy changes in the body, and it usually turns 
into the second type after pregnancy; iv) rare specific diabetes. it is caused by genetic and metabolic 
disorders. 
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Diabetes is one of the main reasons for the increase in the number of deaths in the world, especially the 
Type-II-DM, which is the most common [4]. Many serious health disorders occur when neglected and not 
treated, such as heart attack, myocardial infarction, stroke, renal failure, blindness, neuropathy, gangrene, micro 
vascular damage and increased susceptibility to infection [5], [6]. With its spread, places a great strain on the 
public health system [7], [8]. So, the important step is to detect and diagnose it early. In the modern times, a lot 
of research works concentrated on using machine learning (ML) algorithms to detect and diagnose of DM using 
pima Indian diabetes dataset (PIDD) [9]-[11]. A study of Patel and Tamani [11] showed that the accuracy of the 
logistic regression (LR) and gradient boost (GB) algorithms were higher than the other algorithms at 79%. 
Patil et al. [12] proposed approach depending on Mayfly algorithm for feature selection and support vector 
machine (SVM) classifier to diagnose Type-II-DM. The outcome showed that the accuracy of this approach is 
94.5% comparing with other studies. Panda et al. [13] used four algorithms of ML: SVM, k-nearest neighbor 
(KNN), LR, and gradient boost (GB) to predict DM. The results showed that the GB algorithm outperforms 
the other algorithms with the highest accuracy of 81.25%. Alalwan [14] proposed two conceptual models of 
data mining: self organizing map (SOM) and random forest algorithm (RFA). The experimental showed that the 
accuracy of SOM is outperformed RFA, which reached 85%. Rajni and Amandeep [15] used the RB-Bayes 
algorithm, which reached 72.9% the highest prediction accuracy compared to other algorithms. Bozkurt et al. 
[16] used six various neural networks to classify DM patients. The experiments showed that distributed time 
delay net-works (DTDN) is the best comparing with others with accuracy of 76.00%. Rahman and Afroz [17] 
used data mining tools to comparative study of different classification techniques. These techniques are 
multilayer perceptron (MLP), bayesnet, naive bayes, J48graft, fuzzy lattice reasoning (FLR), JRip (JRipper), 
fuzzy inference system (FIS) and adaptive neuro-fuzzy inference system (ANFIS). The results showed that 
J48graft classifier is best with an accuracy of 81.33%. Khashei et al. [18] constructed a hybrid model of MLP 
which depended on the idea of soft computing and artificial intelligence techniques. The experiments showed 
that the hybird model MLP is outperform over the other methods with accuracy 81.2%. Marcano-Cedefio et al. 
[19] proposed a prediction model AM-MLP that based on artificial metaplasticity (AM) with MLP to predicate 
diabetes. The accuracy obtained from this model was 89.93%. Karegowda et al. [20] presented a hybrid 
approach GA-BPN that combines genetic algorithm (GA) and back propatation network (BPN). The GA was 
used to optimize the weight of BPN. The accuracy of the GA-BPN model was 84.713% which was better than 
without GA. Fiuzy et al. [21] proposed a model based on three techniques which are: fuzzy system to instant 
and precise decision making, ant colony algorithm (ACO) to select best rules in fuzzy systems while ANN for 
modeling, structure identification and parameter identification. The accuracy reached from this model was 
95.852%. Haritha et al. [22] used firefly and cuckoo search algorithms to reduce dimension and then classify 
UCI dataset type I and type II using traditional KNN classifier and fuzzy KNN. The accuracy obtained of UCI 
type II is 71.3% for firefly-fuzzy-KNN and 74.8% for cuckoo-fuzzy-KNN. Zhang et al. [23] used a multi-layer 
feed-forward neural network to predict of DM. This network provided results with 82% accuracy. 

The main objective of this paper is to propose a hybrid model for diagnosing diabetes to increase 
health awareness in the community with the help of health practitioners in diagnosing the disease to control it 
and avoid its danger. This proposed model coyote optimization algorithm and least squares support vector 
machine (COA-LS-SVM) is based on the COA algorithm and the LS-SVM classifier. Where the COA 
algorithm uses to find the optimal values for the LS-SVM parameters to overcome its sensitivity to changes in 
its parameter values and the LS-SVM classifier uses to classify Type-I]]-DM. Achieving a balance between the 
exploration and exploitation distinguishing the algorithm of COA from others during the optimization process. 
For this reason, the authors in this paper motivated to use this algorithm for the first time to find the optimal 
values of LS-SVM parameters to overcome the problem of its sensitivity to changes in its parameter values. 
Also, this study compares the performance of this proposed approach with others ones. The implementation 
results demonstration the powerful of this proposed model COA-LS-SVM, which has the average accuracy of 
98.811% outperforms the other algorithms. The rest of the paper is structured; next section presents COA and 
LS-SVM algorithms. Section 3 described the proposed model and the data set. The experimental results are 
covered in section 4. Finally, the conclusion and future work of this paper are mentioned in section 5. 


2. OVERVIEW OF METHODOLOGIES 

The present work intends to propose a model a model COA-LS-SVM based on COA algorithm and 
LS-SVM classifier. The proposed model was used to classify DM patient accurately. The next two sections 
provide an overview of these algorithms used. 


2.1. Coyote optimization algorithm (COA) 

The swarm intelligence algorithms (SIAs) are inspired from the social action of creatures to solve 
several problems [24]-[26]. One of these recent algorithm is COA for global optimization problems. It was 
lately proposed meta-heuristic algorithm by Pierezan eft al. in 2018 [27]. The major scheme of COA 
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optimizer is stimulated on Canis latrans species that stay principally in North America [27]. This algorithm is 
concerned to represent the coyotes’ social society and acclimatize it with a various algorithmic structure. An 
important advantage of this method is that it maintains a balance between the exploitation and exploration 
phase through this optimization approached [28]. COA is not attentive with the hierarchy and dominance 
rules pursued in grey wolf optimization (GWO) animals, and also it does not rely only on the hunting prey 
pursued in the GWO, but on the social structure and exchange of methodical experiences among wolves. By 
moving towards the prey as a group, it made it characterized by a cooperative trait while devouring the prey 
individually [29]. And coyotes can locate prey through their strong sense of smell. The hunt takes place as 
the coyotes attack the prey in groups, and this requires the agents to update their positions to the best. When 
Coyotes’ infecting their rivals, they are fully ready with a threat chance and flit to new position as excessive 
random distance away from its current position. Consider the following given to start with COA algorithm 
[30]-[33]: 

The COA technique has been prepared based on the social conditions soc? Paget coyote in p‘” 


pack at t‘” instant of time for the decision variables x which can be written: 
soce* = % = (X4, Xp, Xp) (1) 


where, D is the dimension of the search space. The COA starts with setting coyotes’ global population, the 
social condition, is soc. for the j*" dimension which can be written: 


sock = 1b; + 1;(ub; — 1b;) (2) 


where, 77 € [0,1] is the real random number, lb; and ub, are the lower and upper bounds of the j * decision 
variable. The fitness function of each coyote in their current social conditions is calculated in (3): 


fit?” = fP*(socP*) 3) 


Randomly, the algorithm updates the packs location. As well as the candidates update their position by 
departing their packs to other one. This behavior can be represented by the following Probability P, which 
based on Nc : 


P, = 0.005. N2 (4) 


where the number of Nc that makes P, > 1 is restricted to 14 coyotesinner the pack. The alpha coyote as in 
(5) which is the best solution of each iteration. It means that the alpha coyote is only one for the global 
population to optimize the problem in p*" pack at t“” instant of time: 


alphd?" = {sock |arg,=(12,..v,) minf (soc?")} (5) 


pies 


All the coyotes’ information in COA are linked and calculated as culture transformation as the following: 
p,t j 
O(n e+1)/2,j , N.is odd 
pt _ pt pt 
cult =) east len | ” 
—— otherwise 
where 0?" , is the ordered social conditions of coyotes p‘” pack at t“” instant of time. The birth and death of 
a coyote in COA are two important happenings, as this coyote’ age is the age? * © N. The birth of a new 
coyote is affected by the social conditions surrounding the randomly chosen parents, as well as the influence 
of the environment, such as (7): 


it ote 
SOCrt rnd; < P,orj = jy 
pup,’ = sors rndj => P,+P,orj = jr (7) 
R otherwise 
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where rl and r2 are random coyotes from P*" pack, j, and j, are random dimensions of the problem, 
P, and P, are scatter and association probabilities respectively that declare the coyote’s cultural diversity 
from the pack, R; is random number within the bounds of the j‘" decision variable and rnd; is random 
number in [0,1] generated by uniform probability. The cultural diversity of the coyotes in the pack described 
by P, and P,, which can be calculated: 


P, =1/D (8) 


P, = 1—P,)/D (9) 


There are three rules for life cycle of COA as shown in the pseudo-code-1 [31]—[36]: 


The pseudo-code-1. Life cycle rules of COA 

Calculate w and @p 

(w is the worst fitness function of the coyotes; @ is the coyotes’ number in pack) 
If p=1 

Parent survive while the only coyote in w dies 

Else if g>1 

Parent survive while the oldest coyote in w dies 

Else 

Parent die 

End if 


The cultural adaptation among the packs is determined by two factors alpha influence 6, and pack 
influence 6, as: 


6 = ah? = sce, (10) 
5, = cul?* — soch (11) 


where, C,; and C;z are the random coyotes. To update the social condition of the coyote is wrriten as: 
pt _ pit 
new — soc,’ = soc,’ +7rd1.6, + rd2.6, (12) 


where, rd1 and rd2 are random numbers in the range [0,1]. Finally, the new fitness function and the 
updating process of the social condition are founded by (13) and (14) respectively: 


new — fit?" = f (new — sock") (13) 


SOC = pt 


it pt pt 
ptt1 _ { new —sock”, new — fit?” < fit? (14) 
Cc . 

soc.’ otherwise 


The following is the pseudo-code-2 that illustrates COA [34]-[36]: 


The pseudo code-2. Coyote optimization algorithm COA 
Determine population Np and coyote Ne by (2) 
Find the Fitness function of the coyote by (3) 
While stop criteria is not meet do 
For each population P do 
Determine the alpha coyote by (5) 
Calculate the culture transformation by (6) 
For coyote C of each population P do 
Find the new social condition by (12) 
Find the new fitness function by (13) 
Update the social condition by (14) 
End for 
Perform the birth and death process by (7) and pseudo-code-1 
End for 
Perform pack's transitions by (4) 
Update the age of coyotes 
End while 
Output the global best coyote 


2.2. Least squares support vector machine (LS-SVM) 
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One of the versions for SVM classifiers is LS-SVM classifier, which suggested by Suykens 
Vandewalle in 1999 [37]. The goal of LS-SVM classifier is to detect optimal separating hyper-plane in 
higher dimensional space by using euclidean distance [37], [38]. The advantage of LS-SVM is that it can 
handle a set of linear equations instead of the quadratic programming problem that suffers from high 
arithmetic operations [39]. It is famous for its extreme sensitivity to a change in the values of its parameters. 
Consider the following given to start with LS-SVM [39]-[41]: 

In the primal weight space, the optimization problem is formulated in (15), if we 
consider {X;, Vx }h=1 is a training set of N points, in which x, € R” for input data and y, € R for output data: 


Min JW, b)wpe = 5W'W +57 Lhe ek (15) 


yield to: 
Vp—(W! px, +b) = ey, k = 1,2,...,N (16) 


where y is regularization factor, e, is the difference between the desired output y, and the actual output, 


g(.) is nonlinear function, w is weight vector and b is bias term, where b € R. Also, a linear classifier in 
the new area takes as in (17): 


y(x) = sign(w. p(x) +b) (17) 
Calculating the duple area instead of the initial area by finding the following Lagrangian function: 
Li, e, a) = J(w,e) =e Xx (w" p(xx) + ey = (Vic) (18) 


where , is Lagrangian multipliers called support vectors. An objective function in (18) is optimal when it 
satisfies the following conditions of karush-kuhn-tucker (KKT) in (19): 


éL 


so 0>w= par AVP (Xk) (19) 
oL 

ro 0 -«,= yex,k = 1,....,N 

OL . 

a 0 > w’ p(x) + ex —¥, = 0,k = 1,....,N 


The following linear system could obtain after removal of w and e: 


(es 
—|)a= 
Fe y 
Pree new — soc?" , new — fit?” < fit?* 
SOC. = wd ; (20) 
soc.’ otherwise 


where the Kernel Matrix is y = [y,,¥2,-»,Yxl", @ = [a, 2, ..., @x |’ where K € R**", o is Gaussian Kernel 
function. The function estimation has been obtained as the result of LS-SVM model in (21): 


V(x) = Leer MK (x, XK) (21) 


and, to perform LS-SVM, the radial basis function (RBF) has been utilized: 


K(x, X,) = exp (- roar) (22) 


o2 
The following is the pseudo-code-3 that illustrates LS-SVM [42]-[45]: 


The pseudo code-3. LS-SVM Algorithm 

Enter the data set of n data point {xe Widnes where xi is the it input vector and y,E€R is 
the corresponding i target with values {-1,+1}. 

For each enter data point, randomly generate weights. 
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For each enter data point, randomly set the initial bias b and error e. 

Randomly set the initial values of y and o. 

Calculate and look for the values of (w, b, e) that minimize the objective function using 
(15) and (16). 

Calculate the function of Lagrangian using (18) with the solution, which must meet the 
conditions of KKT in a group of (19). 

Calculate the number of support vectors “ using (20). 

Classify the training data of LS-SVM using (21) with Kernel function RBF using (22). 
Classify any new data point using (17) and Kernel function RBF using (22). 

Repeat till stopping criteria is met, usually till reach the maximum number of iterations. 


3. METHODOLOGY 
3.1. The proposed algorithm COA-LS-SVM 

The proposed algorithm is a combination of two algorithms COA and LS-SVM as shown in the 
Figure 1. Where COA algorithm was used in the first stage in order to obtain the optimal parameters for LS- 
SVM, while in the second stage LS-SVM classifier was used to classify patients as: i) first stage: optimizing 
parameters. The goal of this stage is to obtain the optimal parameter values of LS-SVM. The COA algorithm 
was used to optimize the LS-SVM parameter values to overcome its sensitivity to changes in its parameter 
values. These parameters are the regularization factor y and Gaussian Kernel function o; and ii) second stage: 
classification. This stage consisting of two important stages: training stage then followed by testing stage. 
The goal of this stage is classifying the Type-II-DM patients into one of two classes Healthy and DM. 


Proposed Algorithm COA-LS-SVM 


Teo Optimizing Parameters Output: 


LS-SVM 


patient's Rp || Parsmeters using | ey) Csscctcr mm | a* 
COA } Patients 


dataset 


Stage Stage 


Figure 1. A block diagram of the proposed algorithm COA-LS-SVM 


The following is the pseudo code-4 that explain the proposed algorithm COA-LS-SVM in details: 


The pseudo code-4. Proposed Algorithm COA-LS-SVM 

Enter the data set of n data points, {x,y hi, where x: is the it® input vector and y,€R is 
the conforming i target with values {-1,+1} 

For each enter data point, randomly generate weights. 

For each enter data point, randomly set the initial bias b and error e. 

Calculate the optimal values of y and o using the pseudo code-2. 

Calculate the optimal values of (w, b, e) for the objective function using (15) and (16). 
Calculate the number of support vectors “ using (20). 

Classify any new data point using (17) and Kernel function RBF using (22). 

Repeat till stopping criteria is met, usually until the maximum number of iterations is 
reached. 


3.2. Data set 

PIDD used in this research was collected from the machine learning database at UCI repository and 
all the details about it are available in [46]. The data set consists of 768 cases whom were at least 21 years 
old. Table 1 and Figure 2 are summarized the information and features about this data set. 


Table 1. Information of data set 


Data set No. of cases Feature Input Output classes 
Pima 768 8 2 
Healthy cases DM cases 
500 268 
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Feature Input _ Feature Name 
1 Number of times pregnant. 
Plasma glucose concentration a 2 h in an oral glucose tolerance test. 
Diastolic blood pressure (mm Hg). 
Triceps skin fold thickness (mm). 
2-h serum insulin (IU/ml). 
Body mass index (weight in kg/ (heightinm )2. 
Diabetes pedigree function. 
Age (years). 
Class 1 if healthy and 0 if DM patient. 


ADM Wh 


oo 


Figure 2. Features data set 


4. EXPERIMENTAL RESULTS 

The input of the COA is 768 cases of PIDD. In the search domain, these cases are randomly created 
for 100 iterations. The output from the first stage COA of the proposed algorithm is the optimal values of 
LS-SVM parameters which are y=100 and o=0.5. These optimal parameters used with the second stage of the 
proposed algorithm LS-SVM classifier and RBF kernel function (22), in order to find the optimal hyperplane 
that detaches the search area into two classes (Healthy, DM) by calculating the optimal values of (w, b, e) in 
the objective function (15) and (16). 

Accuracy metric was used to evaluate the performance of the proposed method [47]-[50]: 


TP+TN 


Accuracy = — ass (23) 
where: TP = True Positive } denote the numbers of cases correctly diagnosed 
TN = True Negative 
FN = False Negative denote the numbers of cases incorrectly diagnosed 
FP = False Positive 


where the records with healthy label denotes positive cases while DM label denotes negative ones. 

The proposed COA-LS-SVM algorithm is validated using the k-fold cross validation (K-Fold CV) 
method for getting the best average accuracy value. K-Fold CV divides data into K folds. At each iteration, 
one-fold (K) is used as test data set while training data set is resided folds (K1) in K experiments [51], [52]. 
In this work, the value of K = 10 folds, nine data sets for training and one for testing, then repeat this process 
ten times until all data has been evaluated. Figure 3 illustrates the 10-Fold CV. 

The testing average accuracy value of LS-SVM is 98.811% using the kernel function RBF for 10 
times iteration. Table 2 shows the testing accuracy value for each 10-Fold CV. The performance of the 
proposed model COA-LS-SVM has been compared with the models of other works using the PIDD database. 
The main objective is to diagnose whether the patient is diabetic or not using this data. It is appropriate to 
analyze and evaluate the result of the proposed model with other works since the past 10 years, using the 
accuracy scale of classification. Table 3 shows the comparison and analysis of the proposed model with the 
previous works selected based on the classification of accuracy; in addition to the number of cases used in 
each study. 


Lal Training data 3] Testing data 
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fold cross validation 
value for each 10-fold CV 


Fold No. 


Accuracy value 


Fold 1 
Fold 2 
Fold 3 
Fold 4 
Fold 5 
Fold 6 
Fold 7 
Fold 8 
Fold 9 
Fold 10 
Average 


95.953 % 
96.963 % 
98.837% 
99.98% 
98.981% 
99.678% 
98.9359% 
99.99% 
99.99% 
98.81% 
98.811% 


Figure 4 depictes the comparison of the proposed model with the previous approches that used 


PIDD. This graph denotes that the this work 


has outdone previous approches. The highest average 


classification accuracy using proposed model COA-LS-SVM is 98.811% which has outperformed the other 


models. 


Table 3. A comparative study of related research works for average classification accuracy of PIDD 


Algorithm Accuracy __No. of cases 
PCA, Kmeans algorithm [53] 72% 768 cases 
RB-bayes algorithm [15] 72.9% 768 cases 
Cuckoo-fuzzy-KNN [22] 74.8% 768 cases 
DTDN [16] 76% 768 cases 
SVM [54] 78% 460 cases 
LR, GB [11] 719% 768 cases 
Naive Bayes [55] 79.56% 768 cases 
GB [13] 81.25% 768 cases 
J48egraft [17] 81.33% 768 cases 
Multi-layer feed-forward neural network [23] 82% 768 cases 
Hyper MLP [18] 82.4% 768 cases 
GA-BPN [20] 84.713% 392 cases 
SOM [14] 85% 768 cases 
Neural network with genetic algorithm [56] 87.46% 768 cases 
LDA-MWSVM [57] 89.74% 768 cases 
AMMLP [19] 89.93% 768 cases 
K-means and DT [58] 90.03% 768 cases 
A modified mayfly-SVM [12] 94.5% 768 cases 
Fuzzy, DT, ACO and ANN model [21] 95.852% 247 cases 
The proposed algorithm 98.811% 768 cases 


PCA, Kmeans algorithm [53] 
RB- bayes algorithm [15] 
Cuckoo-Fuzzy-KNN [22] 
DTDN [16] 
SVM [54] 
LR, GB [11] 
Naive Bayes [55] 
GB [13] 
J48graft[17] 
Multi-layer feed-forward neural network... 
Hyper MLP [18] 
GA-BPN [20] 
SOM [14] 
Neural Network with Genetic Algorithm 
LDA-MWSVM [57] 
AMMLP [19] 
K-means and DT [58] 
A modified mayfly-SVM [12] 
Fuzzy, DT, ACO and ANN model [21] 
The proposed algorithm 


Methods 


Figure 4. The classification accuracies of 
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5. CONCLUSION 

The diagnosis of Type-II-DM has a significant impact on raising health awareness in the 
community. Therefore, the proposed models for diagnosing this disease can help the practitioners and the 
patient to avoid its danger, reduce its complications and prevent it. To improve the diagnostic performance of 
Type-II-DM disease more efficiently, an effective model based on COA-LS-SVM approach has been 
proposed. The COA algorithm was used in the first stage to optimize the parameters of LS-SVM to overcome 
its problem which is very sensitive when its parameter values are changed. Then LS-SVM classifier was 
employed to classify Type-II-DM. Optimizing LS-SVM parameters using COA algorithm can ensure the 
robustness and effectiveness of the proposed model by finding for optimal values instead of trial and error, as 
well as making the classification more accurate and in less time. For verifying the efficiency of the proposed 
model, experiments were performed on the PIDD dataset by detecting Type-II-DM and comparing the 
accuracy of the model with the others models. The average accuracy of the proposed model was 98.811% 
which significantly outperformed the others previous models implemented on PIDD. And as a work for the 
near future, COA can be as an optimization technique and hybridized with other classification algorithms. 
Also, other evaluation parameters can be applied as well as other kernel functions. 
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